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Abstract With this contribution, we shed light on the relation between the 
discrete adjoints of multistep backward differentiation formula (BDF) methods 
and the solution of the adjoint differential equation. To this end, we develop 
a functional-analytic framework based on a constrained variational problem 
and introduce the notion of weak adjoint solutions. We devise a finite element 
Petrov-Galcrkin interpretation of the BDF method together with its discrete 
adjoint scheme obtained by reverse internal numerical differentiation. We show 
how the finite element approximation of the weak adjoint is computed by the 
discrete adjoint scheme and prove its asymptotic convergence in the space of 
normalized functions of bounded variation. We also obtain asymptotic con- 
vergence of the discrete adjoints to the classical adjoints on the inner time 
interval. Finally, we give numerical results for non-adaptive and fully adaptive 
BDF schemes. The presented framework opens the way to carry over the exist- 
ing theory on global error estimation techniques from finite element methods 
to BDF methods. 
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1 Introduction 

Consider a nonlinear initial value problem (I VP) in ordinary differential equa- 
tions (ODE) with sufficiently smooth right hand side / : [t a ,t[] x R d -+ R d 
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y(t) = f(t,y(t)), te(t s ,U] (la) 
y(t s ) = y s - (lb) 

Consider also a differentiable criterion of interest J depending on the final state 
y(tf) of the solution of ([1]). This is relevant whenever one is not interested in 
the whole solution trajectory y(t) or even the final state y(t{) 7 but only in a 
functional output of these quantities. Note that by standard reformulations 
(cf. [131 P-93], [SI P-25]) this setting also captures the cases of a parameter- 
dependent right hand side f(t,y,p) and a criterion of interest of Bolza type 
J{y) = j t t l J 1 (y(t),p)dt + Mvfa)). 

The adjoint differential equation corresponding to the evaluation of J(y(tf)) 
in the solution of (TTJ) is (see Section [2]) 

\(t) = -fl(t,y(t))\(t), te(t u t a ] (2a) 

A(if) = J'(y(U)y. (2b) 

The adjoint solution describes the dependency of J(y(t{)) on disturbances of 
the nominal solution y(t). Therefore, it is of great importance in the solu- 
tion of optimal control problems. For example, in indirect approaches based 
on the Pontryagin minimum principle, @ appears as part of the optimality 
conditions. 

For an approximation of the solution of (TTJ, the solution of the adjoint 
differential equation ([2J can be computed in two different ways, the continu- 
ous adjoint approach or the discrete adjoint approach. The former solves the 
adjoint differential equation by numerical integration, see for example 
Whereas the latter applies automatic differentiation techniques to the numer- 
ical integration scheme. This approach, firstly presented in [7J, is known as 
internal numerical differentiation (IND). It has significant advantages in di- 
rect derivative-based approaches for the solution of optimal control problems 
that use integrators, e.g. direct single and multiple shooting. 

In the case of Runge-Kutta methods, the discrete adjoint scheme generated 
by adjoint IND is itself a Runge-Kutta scheme for the adjoint differential 
equation (jSJ , and thus gives a convergent approximation to the adjoint solution 
[71l24j . In the case of continuous and discontinuous Galcrkin methods applied 
to (fT]), the discrete adjoint scheme yields an approximation to the solution of 
© (see e.g. Q~5]). The discrete adjoints of discontinuous Galerkin methods for 
compressible Navier-Stokes equations are, for example, considered in j!4) . 

The situation becomes significantly more complex in the case of multistep 
methods, as the discrete adjoint schemes of linear multistep methods (LMM) 
are generally not consistent with the adjoint differential equation ([2]). But they 
still provide approximations of the sensitivities J\y(tf))~jjy a t the initial 
time t s that converge with the rate of the nominal LMM [8,22]. Due to this 
property, the multistep BDF method and its discrete adjoint scheme are used 
successfully in direct methods for the solution of optimal control problems, 
e.g. in direct multiple shooting 
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In this contribution, we focus on the relation between the discrete ad- 
joints of variable-order variablc-stepsize BDF methods and the adjoints de- 
fined by ([2]) . To this end, we construct a suitable constrained variational prob- 
lem (CVP) in a Banach space setting using the duality pairing between the 
space of continuous functions and its dual, the space of normalized functions 
of bounded variation. It turns out that the adjoint of a stationary point of 
this CVP is the normalized integral of the solution of the Hilbert space ad- 
joint differential equation (|2|). Motivated by PDE nomenclature, we will call 
it a weak solution of © or shortly weak adjoint. We apply Petrov-Galcrkin 
techniques, and show that with the appropriate choice of basis functions the 
infinite-dimensional optimality conditions of the CVP arc approximated by 
the BDF method and its discrete adjoint scheme obtained by adjoint internal 
numerical differentiation of the nominal BDF scheme. In particular, we obtain 
that discretization and optimization commute in this Banach space setting. 
Finally, we prove that the finite element approximation of the weak adjoint, 
which can be computed by a simple post-processing of the discrete adjoints, 
converges to the weak adjoint on the entire time interval. This result is based 
on the linear convergence of the discrete adjoints to the solution of J2} on the 
inner time interval which is shown as well. 

This paper is organized as follows. In Section[5]wc derive the adjoint differ- 
ential equation as part of the optimality conditions of an infinite-dimensional 
constrained variational problem in Hilbert spaces. The BDF method and its 
discrete adjoint scheme generated by internal numerical differentiation tech- 
niques are then described in Section [3] In Section 2] we present the optimal- 
ity conditions of the constrained variational problem embedded into the Ba- 
nach space of all continuously differentiable functions. After showing the well- 
posedness of the optimality conditions and their relation to the Hilbert space 
optimality conditions, we extend the setting to capture the space of all func- 
tions that are continuous and piecewise continuously differentiable. For the 
Petrov-Galerkin discretization of Section [5] we choose suitable finite element 
spaces that yield equivalence between the discrctized optimality conditions 
and the BDF scheme together with its discrete adjoint scheme. In Section [5] 
we start by proving the convergence of the discrete adjoints to the solution 
of the Hilbert space adjoint equation on the inner time interval. Using this 
result, we show the convergence of the finite element approximation to the 
weak adjoint solution. Section [7] presents numerical results on a nonlinear test 
case with analytic solutions. 



2 Initial value problems and their adjoints in a Hilbert space 
setting 

In this section, we derive the adjoint differential equation in a Hilbert space 
functional-analytic setting. Our goal is to specify the assumptions on the ini- 
tial value problem, to settle some notation, and to lay the groundwork for 
the constructions that follow. In particular, we make explicit the connection 
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between the adjoint differential equation and the Lagrange multiplier of the 
solution of a constrained variational problem in a Hilbcrt space setting based 
on the Sobolev spaces usually found in finite clement formulations. 

2.1 Existence, uniqueness and differentiability of the nominal solution 

Assume that the right hand side f(t, y) of (JXJ) is continuous on an open set 
T> C R x M. d with (t s ,y s ) g V and its first-order partial derivative f y (t,y) 
is continuous on T>. Thus, according to the Picard-Lindelof Theorem j!3j . 
problem ([1]) is well-posed in the sense of Hadamard, i.e. it admits a unique 
solution depending continuously on the input data. Beyond that, the solution 
y(t) is continuously differentiable on an open interval I, see [13] . and we 
assume that tf is chosen such that [t s ,t{] C X. Thus, the solution y(t) of ([T]) 
lies in the Banach space C 1 [t s ,t(] d of all continuously differentiable functions 
from [i s ,tf] to M. d equipped with the usual norm. Furthermore, the solution 
y(t) = y(t;t s ,y s ) is continuously differentiable with respect to y s and the 
derivatives w t (t) = dy(t;t s ,y s )/d(y a ) i solve [13] 

Wi{t) = f y (t,y(t))wi(t), t e (t s ,t f ] (3a) 
Wi(t s ) = et (3b) 

where is the ith unit vector, i g {1, . . . , d}. Moreover, Wi(t) exists uniquely 
and is continuously differentiable on [t s , if] since the partial derivative of the 
right hand side of ([3]) with respect to Wi is continuous in (t,Wi). The residual 
of dJaJ) 

P(l/) :=*(•) -/(•,»(•)) (4) 

lies in the Banach space C°[t s ,tf] d of all continuous functions from [t s ,t{] to 
R d equipped with the standard norm ||g|| c o[ ts tf ]d = Yli=i \\9i\\c [t a t f ] wnere 
\\9t\\c°[t a M] = max *e[t 5 ,tf] \3i(t)\- 

2.2 Lagrange multipliers and adjoint differential equations 

The core of this section is the identification of the adjoint as the Lagrange 
multiplier of a constrained optimization problem in a functional- analytic set- 
ting. The ideas described here are of course not new. However, the setting for 
the case of ordinary differential equations is fundamental for this contribu- 
tion. Since we have not found it in the literature, we include here a detailed 
derivation. 

Recall that functions in C°[t s ,tf] d , restricted to the open interval (t a ,t{), 
form a dense subset of the space L 2 (t s ,tf) d of all quadratically Lcbesguc- 
intcgrablc functions. Similarly, recall that the subset C 1 [t s ,if] <1 is dense in the 
Sobolev space H 1 (t s ,t[) d of all L 2 (t s , £f) d -functions with weak derivative in 
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L 2 (t s ,t f ) d (see [2 Ch.3]). Furthermore, both spaces L 2 (t s ,t f ) d and H 1 (t s ,t{) d 
are Hilbert spaces. 

Knowing this, we embed the initial value problem ([T]) into an optimization 
framework and derive the adjoint differential equation as part of the first- 
order necessary optimality conditions. To this end, we consider the constrained 
variational problem 

min J(y(t { )) (5a) 
y 

s.t. y(i) = /(*,»(*)), «e(«s,tf] (5b) 
y(t s ) = y s (5c) 

which is equivalent to evaluating J(y(t[)) in the solution of ([T}. Considering 
([5]) on the space H 1 (t s ,tf) d , the Hilbert space Lagrangian £ : H 1 (t s ,t[) d x 
L 2 (t s ,tf) d —> R of ([5|) using the L 2 -scalar product is 

£(y, A) := J(y(t f )) - J' \ r (t) [y(t) - f(t, y(t))] dt - A T (t s ) [y(t s ) - y s ] 

where A is the Lagrange multiplier. The optimality condition of (|S|) is based 
on the Frechet derivative of C at (y, A) in direction (w, x) which exists due 
to Frechet differentiability of J and [H Ch.0§0.2.5] 

£'(y, \)(w, X ) = C y (y, X)(w) + C x {y, \){ X ) 

rtt 



J'(y(t f ))w(tt) - / A T (t) [w(t) - f v (t, y(t))w(t)} dt - \i(t s )w(t s ) 

+ {- J t ' x J (t) W) - f(t,y(t))} dt - x J (t B ) [y(Q - y s ]j . 

The necessary condition for a stationary point (y, A) G i? 1 (t S) if) d x L 2 {t s , t() d 
of (|S|) is that £'(y, X)(w, x) = holds for all directions (to, x) G H 1 ^, t{) d x 
L 2 (t s ,t f ) d . Choosing w = Oe H l (t s ,t t ) d and only varying x G L 2 (t B ,tf) d the 
necessary condition reads 

C X T (t) [y(t) - f(t, y{t))] dt + x T (*s) [y(t s ) - y s ] = 0, Vx (6) 

which possesses the same unique solution y G C 1 [t s ,t[] d as (JXJ) . Taking now 
X = G L 2 (t s ,t{) d and only varying w G H 1 {t s ,tf) d one obtains using inte- 
gration by parts 



J'(l/(tf))-A T (t f )]ti;(t f )- / A(t) + /T(t,»(t))A(t) w(*)d* = 0, Vw 



which possesses the same solution as @. Under the assumptions of Section 
I2.fl the unique solution \(t) of (J2J) is continuously diffcrcntiable on [t B ,t{] and 
depends continuously on J'(y(tf)) J . 
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3 Efficient solution of initial value problems and sensitivity 
generation 

We now review the numerical solution of ODEs using BDF methods, and 
the corresponding sensitivity generation using automatic differentiation tech- 
niques. We briefly introduce BDF methods with an emphasis on the trajec- 
tories they define as functions of time. Then, we show how to obtain discrete 
adjoints in the BDF context, and review what is known so far about their 
relation to the solution of (J2|). 



3.1 Backward differentiation formula method 

This section follows the lines of [531 p-181fT and p.253f]. Consider the backward 
differentiation formula method 

yo = y s (7a) 

k„ 

^ Q=i Vn+i-i = hnf(t n+ i,y n+ i), n = 0,...,N-l (7b) 

with a self-starting procedure that begins with fco = 1 (implicit Euler) and in- 
creases successively the order of the steps until the maximum order is reached. 
Note that BDF methods are used up to order 6, since for higher order they 
become unstable. In practical implementations both the stepsize h n and the 
order k n are chosen adaptively to obtain better performance. The numerical 
solution is computed at discrete time points t s = to < t\ < ■ ■ • < ijv = tf 
with t n+ \ = t n + h n and y„ denotes the numerical approximation to the value 
y(t n ). The coefficients are determined by 

aj B) = /Ui B) (Wi), where L^(t)= f[ *Z*^=i ( 8 ) 

are the fundamental Lagrangian polynomials. Thus, the coefficients depend 
on the discrete time points and the order. In each step, the BDF method 
provides a polynomial approximation to the solution y(t) of JT]) in a natural 
way through the interpolation polynomial 

i=0 

also known as dense output. The composition of all these polynomials gives 
a continuous and piecewise continuously diffcrcntiablc approximation to the 
solution y(t) on the whole time interval [t S) tf]. 
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3.2 Adjoint differentiation of BDF integration schemes 

The basic idea of internal numerical differentiation [7J is to differentiate the dis- 
cretization scheme used to obtain the nominal approximations yo,yi, ■ ■ ■ , Un 
for specified adaptive components h n and k n using automatic differentiation 
(AD) techniques either in forward or in adjoint mode. Adjoint IND was first 
described in [8] for Runge-Kutta integration schemes and later on in [10] for 
BDF methods. Applying adjoint IND to the BDF scheme ([7J we obtain the 
discrete adjoint scheme 

c*o " *n - J'iVN) 1 = h N -ifl(t N ,y N )\ N (10a) 



E 



a\ n+l) \ n+ i +i = h n fT(t n+ i,y n +i)\i+i, n = N - 2, . . . , (10b) 



0<i<N- 
i<fc m 



with input direction J'^Vn) 1 and the convention = for i > k n , fc max = 
max n {fc„} (see also [12]). This scheme forms together with ([7]) the optimality 
conditions of the nonlinear program (NLP) 

min J{y N ) s.t. (11) 



v 



with y T '■= [Vq yl ■ ■ ■ y]y] ■ This NLP is a discretization of the constrained 
variational problem ([5]). 

The discrete adjoints given by (1101) are the exact derivatives of the nomi- 
nal integration scheme ([7J (beside round-off errors). Furthermore, for a BDF 
scheme with constant order fc, the discrete adjoint Ai converges with the same 
order k to the value X(t s ) of the adjoint solution of ©, cf. [51122]. 
The discrete adjoints are generally inconsistent approximations to the solu- 
tion of ([2]) around a nominal approximation passing through {y n }n=oi see 



Figure 1(b) In the case of constant order k and constant stepsizes h, the 
discrete adjoints coming from the adjoint initialization and adjoint termina- 
tion are inconsistent as well, whereas the main part, i.e. formula (|10b|) with 



n = N — k, . . . , k, gives consistent approximations of order k, see Figure 1(a) 
Due to the inconsistency of the discrete adjoint scheme (fTU)) with the ad 
joint differential equation ^ discretization and optimization of ([5]) do not 
commute in the commonly used Hilbert space setting. This gives rise to the 
question for a new functional-analytic setting that is suitable for multistep 
methods. The next sections are devoted to the development of this setting. 



4 Solution of the constrained variational problem in a Banach 
space setting 

As seen in the previous section, the Hilbert space setting of Section [2] is not 
suitable to analyze multistep methods and their discrete adjoints. Here, we 
propose to embed the constrained variational problem ([S]) into a Banach space 
setting and show the well-poscdness of the corresponding infinite-dimensional 
optimality conditions. 



s 
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(a) Non-adaptive BDF method with con- (b) Adaptive BDF method with variable 
stant order and constant stepsizes. order and variable stcpsizc. 

Fig. 1 Comparison of discrete adjoints \ h = [X^, A?}] 1 and analytic solution A = [Ai, A2] T 
of the Hilbert space adjoint differential equation on the Catenary test case (see Section 0. 



4.1 General considerations 

Duality pairing According to Section 12.11 the residual p{y) of ([lap is an el- 
ement of the space C°[t s ,ti] d . Thus, we focus on the duality pairing between 
the Banach space C°[t s ,t{] and its dual. The Riesz Representation Theorem 
[2"01 Ch.5§5.5] states that for every continuous linear functional £ on C°[t s ,t{] 
exists a unique W £ NBV[i s ,if] such that 

%] = 9) NBy[uMtC0[uM = J ' g(t)d#(t), (12) 

where the integral is the Riemann-Sticltjes integral [2TJ Ch.VIII§6]. The Ba- 
nach space NBV[t s ,tf] consists of all normalized functions of bounded varia- 
tion on [t s , tf] that are zero in t s and continuous from the right on (t s ,t{). It 
is equipped with the total variation norm 

m 

H^llNBV[W f ] = SU P ^ |<?(ti) - V (ti_i)| 

where the supremum is taken over all partitions t s = to < ■ ■ ■ < t m = tf of 
[t s , if]. According to the Riesz Representation Theorem, for each & the value 
of the total variation norm coincides with the value of the dual norm given by 



l*1lNBV[t s ,t f ] - .. ,. max , 

IMIc°rt B ,t f ] =1 



NBV[t s ,t f ],C°[t.,t f ] 



Hence, we will always use the norm that is better suited in the particular situ- 
ation. The dual of the finite Cartesian product C°[t s , tf] d is the finite Cartesian 
product NBV[t s ,tf] d of the duals with duality pairing 

"> d /iff ptf 

(*>ff>NBV<»,(CO)«l = I] (**>0i>NBV,C° = H / =■ / 

i=l i=l 

and dual norm | \W\ | NBV [t B ,t f ]<' = maxi<,< d ||*j|| NB v[t s ,t f ]! see E3 Ch.II§12.1]. 
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Variational formulation of the initial value problem The variational formula- 
tion of (JT|) on the described Banach spaces reads: Find y G C 1 [t s ,tf] d with 
2/ (i s ) = 2/s such that 

^ f y(i)-/(i,y(i))dr(i) =0 VreNBV[i s ,i f ] d . (13) 

This problem possesses at least one solution which is the strong solution given 
by |T]). The uniqueness follows from the fact that for continuous functions 
g G C°[t s ,t{\ it holds 



/ ' g(t)d&(t) = V«?eNBV[i s ,i f ] 
Ju 



.9 = 0. 



Thus, both formulations (TTJ) and (flU|) give the same solution y(i) and (|13[) is 
well-posed according to the well-posedness of (TTJ. 



4.2 Infinite-dimensional optimality conditions 

Considering the constrained variational problem (JS|) on the function space 
C^isjtf]'*, the Lagrangian £ : C 1 ^,^ x NBV[i s ,i f ] d x R d -> K is given by 

£(y, A, I) := J(y(t { )) - [ ' y(i) - /(t, y(i)) dA(i) - F _ y s ] ( W ) 

where the Lagrange multipliers I and A lie in the corresponding dual spaces M. d 
and NBV[i s , tf) d . The Lagrangian is based on the variational formulation (Tf"3"|) 
and includes the initial condition using an additional Lagrange multiplier. We 
first state the central theorem of this section which describes the stationary 
point of £ and defer the proof for the end of the section. 

Theorem 1 The optimality conditions of the constrained variational problem 
© on C 1 [t s ,tt] d , i-e- 

J'(y(t { ))w{t { ) - / w{t) - f y (t, y(t))w(t) dA(t) - Uw(U) = 0, (15a) 

- f f y(t)-f(t,y(t))dr(t) = o, (15b) 

-rT [y(i s ) - y s ] = 0, (15c) 

V(i»,r,r) g C^t^tff x NBV[i s ,i f ] d x R d , 

possess a unique solution (y,A,l) in C 1 [t s ,t[] d x NBV[i s ,if] d xR d . Moreover, 

y(i) is the solution of (ft}, o-nd I and A(t) are given in terms of the adjoint 
solution A(t) of (J2J 

I = A(i s ), A(t) = f A(r)dr, (16) 
with componentwise integration. 
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The necessary optimality condition for a stationary point (y, A, I) of the 
Lagrangian (|14[) is given by 

C y (y,A,l)(w)\ /0\ 

£ A (l/,A > i)(r) = , Vw&C%,t f ] d , reNBV[< s ,t f ] d , rel d 
A(y,A,Z)(r) / V / 

which is exactly (fT5j) . As equations f|15b[) - (|15c[) are already given by (fT3jl and 



discussed over there, we now focus on equation (|15a[) of the optimality con- 
ditions. Provided that y(t) is known, the adjoint problem in variational for- 
mulation reads: Find (A, I) G NBV[t s ,t { } d x R d such that (|T5a)) holds for all 

w g C%,tf] d . 

Lemma 1 For the solution y(t) of f)15b|> - f|15c|) . a corresponding adjoint so- 
lution (A, I) G NBV[t s ,t[] d x R d of (fT5a| is provided by (fT6|) . 



Proof Recall that the adjoint differential equation ^ has a unique solution 
A G C 1 [t s ,t f ] d (cf. Section Multiplying the transposed of fl2aj fr om the 
right by any w G C 1 [t s ,t{] d , integrating over [i s ,if] and adding the transposed 
of (|2b| multiplied by w(tf) yields 

A(i) + mt, y(t))\(t)] 1 w(t)dt - [A(tf) - J'(y(t f )) T ] T «>(tf) = 0. (17) 



Integration by parts gives for all id G C 1 [t s ,tf]' 1 

A T (i) [t»(t) - f y (t, y(t))w(t)} dt - \ T (t s )w(t s ) + J'(y(tf))w(ti) = 0. 



Consequently, (HI]) provides a solution (A, I) G NBV[f s ,i f ] d x R d of ([I5a) . 
since the indefinite integral .Aj(f) = J. Ai(r)dr is a normalized function of 
bounded variation QH Sec.32] and it holds g(t)dAi(t) = A^(t)g(t)dt = 
f£ Xi(t)g(t)dt, cf. EB Ch.VIII§6]. □ 

The next lemma proves the uniqueness of the weak adjoint solution. 

Lemma 2 For the solution y(t) of (|15b[) - (|15c[) . the corresponding adjoint so- 
lution (A, I) G NBV[t Sl t f ] d x R d of (TT5a|) is unique. 

Proof Equation (|15a|) is equivalent to 

w{t) - f y (t,y(t))w(t)dA(t) +Uw(t s ) = J'(y(t { ))w(t { ) Vw G C%f f ] d 
A. " ' 

" v ' =:B(w) 

=:A(A,l)(w) 

where B and A(A, I) are linear functionals on C 1 ^, t{] d and A : NBV[i s , if] d x 
R d -> (C 1 [i s ,i f ] d )' is linear in (A,Z). We have to show that A/"(A) = {(0,0)}, 
where the nullspace of A is given by 

M(A) = {(A, I) G NBV[t s ,tf] d x R d : A(A,l)(w)=0 Vw G C%i f ] d } . 
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Due to Section |2~T1 for every initial value Wi(t 8 ) £ R d there exists a function 
Wi G C 1 [t s ,tf] d that satisfies ([3"aj) . Inserting u>i in Z) then gives 

I)(t»i) = / ' Odvl(i) + Z T wi(i s ) = + Z T t«i(i s ). 

Thus, Z has to vanish in order to ensure A(A, l)(w) = Vw G C 1 [t s , U} d . Now, 
we search for functions A G NBV[i s ,if] d with 

A(A,0)(w)=J w(t)- f v (t,y(t))w(t)dA(t) =0 Vu> G C 1 ^, if] d . 

With g(t) := w(t) — f y (t, y(t))w(t), it is the same to vary either u; G C 1 ^, t{\ d 
or g G C°[t s ,if] , since the inhomogeneous ODE possesses a unique solution 
w(t) for every According to the uniqueness of \P in (|T2|) it holds 




g(t)dA(t)=Q Vg e C°{t s ,tt} d => A = 0. 



Consequently, AA(^4) = {(0,0)} which proves the uniqueness of the solution 
of (HSU). □ 

With this knowledge at hand we can now come to the proof of Theorem [T] 

Proof (of Theorem\j$ As seen in Section B~T1 the equations (|15b[) - (|15cp have 
the same unique solution y(t) as ((T|) which implies their well-posedness. Ac- 
cording to Lemma [U a solution of (|15ap is provided by (fT()]) . Furthermore, it 
is the only solution of (|15aj) according to Lemma [21 Since \{t) depends con- 
tinuously on J'(y(t { )) J (cf. Section l2?2j) this still holds for A(t) and Z. Thus, 
(|15a[) together with (|15b|) - (|15cj) is well-posed. □ 

With the concept of weak solutions from partial differential equations (see 
e.g. [I2]), the triple (y, A, I) is a weak solution of (JTJ and ([2]), since it solves the 
variational formulation (jT5j) of ([1]) and ©. Thus, we will call A a weafc adjoint 
solution of (J2) or shortly weak adjoint. Note that for the nominal solution, the 
weak solution y defined by (|15cj) - (|15b[) is directly the classical solution of ([TJ. 
Whereas for the adjoint, the weak solution A is sufficiently regular such that 
a classical solution of ([2]) is provided by A' = A. 

4.3 Extension of the infinite-dimensional optimality conditions 

As seen in Section l3~Tl the approximations to the solution of ([1]) obtained from 
BDF methods are not continuously diffcrentiable on the whole interval [t s ,t{] 
but rather continuous and piccewise continuously diffcrentiable. To capture 
this case, an appropriate extension of the trial space C 1 [t s , tf] d is required. To 
this end, we employ a time grid t s = to < t\ < • • • < t^ — U and a partition 
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of [i s , if] using subintervals /„ = (t n ,t n+ i] of length h n = t n +\ — t n such that 
[t s , tf] = {t s } U 7o U • ■ • U ijv-i- Choosing the trial space as 

Y[t s ,t f ] d := [y G C°[t s ,t f ] d : G Cl{I n ) d ) , (18) 

where C£(/„) is the space of all continuously differcntiable and bounded func- 
tions with bounded derivative [H Ch.l], the extended Lagrangian C : Y[t s , tf] d x 
NBY[t Sl tf] d x R d -)• E of © solved on the function space F[t s ,t f ] d is 

iV-l . 

C(y,A,l):=J(y(t { ))-J2 / V(*) - /(*, V (*)) <*A(i) - F [y(t 8 ) - y 8 ] . 

n=0 

The Lagrangian £ is based on the extension £ of the linear functional £ given 
by (fl2|) from C°[i s , tf] to F[t s , tf]. The existence of £ is guaranteed due to [26l 
p. 89]. We define the extended Ricmann-Sticltjes integral on /„ = (t n ,t n +i] 
using the partition t n < To < T\ < ■ ■ • < r m = t n +i and the convention that 
Ok = Tfe-i G bfc-i, rfe] for A; = 1, ... ,m by 



g(t)d<P(t) = lim ^ S (t,)[^)-1?(th)] (19) 



C*n,t»+lJ k= l 

such that 

JV-l 



This extension £ restricted to the continuous functions g G C°[t s ,t{] coin- 
cides with £. Thus, the same holds for the Lagrangian C. Furthermore, if 
g G C°[t n ,t n+1 ] then ff (t)d#(t) = ff(t)d#(t). 

With these definitions at hand, we first state the main result of the section. 

Theorem 2 The optimality conditions of the constrained variational problem 
© on Y[t s ,t f ] d , i.e. 

N-l . 

J'(y(t t ))w(tt) E / w(t)-f y (t,y(t))w(t)dA(t)-Uw(t a )=0, (20a) 
n=0 



JV-l 

E / y(t)-f(t,y(t))dr(t)=0, (20b) 

n=0 



-rT [y(t 8 ) - y s ] = 0, (20c) 
V(w,r,r) G F[t s ,t f ] d x NBV[t s ,t f ] d x R d , 

possess a unique solution (y,A,l) in Y[t s , tf] d xNBV[< s , tf] d xK d t/iat coincides 
with the solution of () 15[) . 



We start with considering the nominal equations (|20cH - (|20bj) . 
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Lemma 3 The solution y(t) of (|15c[) - (|15b[) solves the extended variational 
formulation ([2Dc )l -(|20b [ ). 

Proof Let y(t) be the solution of (fT5c|) - (fT5b)) . From C 1 [t s ,t f ] d C F[t s ,t f ] d 
follows that y e Y[t s , tf] d . Since the integral J t * f gi(t)dri(t) with ^(t) := y%{t) — 
fi(t,y(t)) exists, also the integrals over the subintervals L n+1 gi(t)dri(t) exist 
and it holds [HJ Ch.VIII§6] 

/ 5i (i)dA(i) = ]T / ft(t)dH(t) = E / ft(*)dT<(t) 

where the second equality is due to the extension (|19[) of the Ricmann-Stieltjes 
integral, i = 1, . . . , d. Thus, equation (|15bp becomes V-T G NBV[t s , tf] d 

r t t N-i r 

0=/ y(t)-f(t,y(t))dr(t) = J2 y(t)-f(t,y(t))dr(t) 

which coincides with (|20b[) . □ 

Lemma 4 The extended variational formulation (|20b() - (|20cl) possesses a unique 
solution y(t). 

Proof Let y(t) be a solution of (p0b ]> - (|20c ]) . The space NBV[t s ,i f ] d contains, 
in particular, the functions that vanish everywhere except on (t n ,t n +i). Thus, 
a necessary condition for y(t) being a solution of (|20bl) - (|20cl) is that each 
addend has to vanish, i.e. fj y{t) - f(t,y(t))dT(t) = VT e NBV(/„) d 
with r(t n+ i) = 0. The fundamental theorem of variational calculus yields 
it{t) — f(t, y(t)) = on (t n , t n +i) for all n = 0, . . . , N — 1. On the other hand, 
NBV[£ s ,£f] d contains also the constant functions having a single jump in t n . 
They give the necessary conditions y(t n ) — f(t n ,y(t n )) = for n = 1, . . . , N, 
Since f(t,y) is continuous in both variables and y £ C°[t s ,tf] d , y(t) is neces- 
sarily continuously differentiable on [£ s , if]. Thus, every solution of (|20b > p0c l) 
satisfies (|15b[) - (|15cp which possesses a unique solution. □ 

As conclusion of this lemma, the dependency of the solution of the extended 
variational formulation (|20b[) - (|20c[) on the input data is continuous and thus 
the problem is well-posed. 

Now, we focus on the adjoint problem in extended variational formulation 
which is for a given y(t): Find (A, I) E NBV[i s , t f ] d x R d such that l(2Da|) holds 
for all w e Y[t Sl t c ] d . 



Lemma 5 For the solution y(t) of ()20b(l - f|20c[> . the corresponding adjoint so- 
lution (A, I) E NBV[i s ,t f ] d x R d of (|2T)aj) is provided by fTBj). 
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Proof We proceed in the same way as in the proof of Lemma [TJ but choose 
w G Y[i s ,if] d for the multiplication and split the integral in (TIT)) using the 
subintervals /„ (same arguments as in the proof of Lemma [3]). Integration by 
parts of all integrals yields the equivalent equation 

jv-i „ 

-\T(t s )w(t a ) -J2 AT (*) M*) - v(*)M*)] dt + J'{y{ti))w(tt) = o. 

n=0 J 1 ™ 

Thus, the choice (fTo) provides a solution of (|20a[) . □ 



Lemma 6 for i/ie solution y(t) of (|20b|) - (|20c|) , £/ie corresponding adjoint so- 
lution (A, I) E NBV[t s ,tf] d x R d of plaj) is unique. 

Proof We follow mainly the proof of Lemma [21 Equation (|20a[) is equivalent 
to 

N-i . 

/ w(t) - fy(t,y(t))w(t)dA(t)+Pw(t B ) = J'(y(ti))w(t f ) Vw 6 Y[t a ,t f ] 

71 = " ^~ ' 



=B(w) 



=:A(A,l){w) 



where A(A,l) is also a linear functional on y[i s ,£f] d and A : NBV[i s ,if] d x 
R d (r[t s ,tf] d )' is linear in (A, I). We show again that AT (A) = {(0,0)}. 
Since C 1 ^, t[] d C F[i s , if] d , Z has to vanish due to the same arguments as used 
in the proof of Lemma [21 Thus, the following equation 

N-1 r 

A(A,0)(w)=J2 w(t)-f y (t,y(t))w(t)dA(t)=0 Vw e Y[t a , tf] d 

n=0 

has to be satisfied also for w 6 C 1 [t s ,t{] d C Y[t s ,t{] d , i.e. with g(t) := w(t) — 
f y (t,y(t))w(t) it becomes 

N-i r 

/ g(t)dA(t) = o VgeC°{t s ,t f } d . 

Furthermore, as </(i) is continuous the integral J t * f g(t)dA(t) exists and coin- 
cides with the sum of the integrals over the subintervals (same arguments as 
in the proof of Lemma [3j and the proof can be finished in the same way as 
that of Lemma [2j □ 

With all this at hand we are able to prove Theorem [5] 

Proof (of Theorem^) Lemma [3] and 0] prove the existence of a unique solution 
of H20b p -(|2Dc j) coinciding with the solution of (|15b|) - (IT5c|) . For this solution, 
equation (|20a[) has a unique solution given by (TIT))) due to Lemma [5] and El □ 
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Fig. 2 Basis function </> n of Yp [t s , if ] d with ko = 1, k„ = 2 for n > and constant stcpsizcs 
h n = h for all n. 

5 Petrov-Galerkin discretization of the extended optimality 
conditions 

In order to solve the infinite-dimensional optimality conditions (|20[) numer- 
ically, the infinite-dimensional function spaces have to be approximated by 
finite-dimensional subspaces, the finite element spaces. This so-called Petrov- 
Galerkin approximation transfers the infinite-dimensional conditions into a 
finite-dimensional system of equations which can be solved on a computer. 
The first part of the section focuses on the finite-dimensional subspace, and 
the second part is devoted to the resulting system of equations. 

5.1 Finite element spaces 

This section deals with the discretization of the function spaces Y[t s ,t{) d and 
NBV[t s ,if] d by choosing appropriate sets of basis functions. 

Trial space To discretize the trial space Y[t s , tf] d we use piecewise polynomials 
of order k n on the sub-interval /„ 

Y P [t s ,t i ) d ■■= {y e c°[t s ,t { ] d ■■ v\ In g v^\i n ) d ) . (21) 

We choose local basis functions <j> n that are composed of the fundamental 
Lagrangian polynomials (|§J) restricted to the particular subinterval. Figure [5] 
shows the basis function <f> n S Yp[t s ,tf] d with ko = 1, k n = 2 for n > and 
h n — h for all n. The support of a single basis function depends on the orders 
and contains at most seven adjacent subintervals as BDF methods are stable 
up to order 6. 

The solution y 6 Y[t s ,tf] d is then approximated by 

N 

y(t) » y h (t) := y s o (t) + ]T y n <t>n(t) 

71 = 1 

which results in N ■ d degrees of freedom {y n € M. d }^ =1 , since the initial value 
2/o = 2/s is already fixed. To achieve locally the order k n > 1, former values 
Z/n+l-fc„j ■ • ■ i yn are reused to set up the interpolation polynomial of order k n 
which is afterwards restricted to /„. 
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Test space We approximate the test space NBV[t s ,if] d using Heavisidc func- 
tions as basis functions. We choose them to be continuous from the right with 
discontinuity in t n . Thus, a function A € NBV[t s ,if] d is approximated by the 
linear combination of these basis functions in the form 

N 

A(t) a A h {t) := K^\ n H n {t) (22) 

n=l 

where the h n -\ appear for reasons which will become clear later. Note that 
A h is a step function with initial value A (t s ) = and jumps of magnitude 
h n -iX n at t n for n = 1,...,N. Thus, it is A h (t n ) = A h (t n -i) + h n -i\ n at 
the time points and A h (t) = A h (t n ) for inner points t £ (t n ,t n+ i). We denote 
this space by ZH[t s ,tf] d . 

Regarding the relation (fl~6f between the adjoint solutions A and A, the clas- 
sical derivative of A h fails to exist. But A h is still diffcrcntiable in a weak form 
such that its weak derivative is given by the Dirac measures at {t±, . . . ,t^} 
with heights {ho\i, Hn-i^n}, see e.g. [5J Sec. 4.24]. 



5.2 Finite-dimensional optimality conditions 

In this section, we approximate the infinite-dimensional optimality conditions 
(|20p by finite-dimensional equations that result from approximating the func- 
tion spaces by the finite element spaces of Section [S~T1 The resulting system 
of equations will be discussed in the following. 

Theorem 3 The discretized optimality conditions, i.e. 

J'{y h {U))w h {U) 

N-l ,. 

/ w h (t)-f y (t,y h (t))w h (t)dA h (t)-{l h yw h (t s )=Q, (23a) 

N-l 

E / y h (t)-f(t,y h (t))dr h (t) = o, (23b) 

-[r'y [y h (t s )-y s ] = 0, (23c) 



n=0 



V(w\r\r h ) e Y v [t si t t ] d x z H [t s ,t t } d 



x R d , 



are equivalent to the BDF scheme with prescribed stepsizes and orders 
together with its discrete adjoint scheme (|10|) . 

The above theorem is the main result of this section. The proof follows 
directly from the two lemmas given below. 



Lemma 7 The equations (|23b|) - (|23c[) are equivalent to the BDF scheme ([7]) 
with prescribed stepsizes and orders. 
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Proof We first consider one addend of (|23bl) 
y h (t)-f(t,y h (t))dr h (t) 
= [r h (tn+x) - r h (t n )] J {y h (t n+1 ) - f(t n+1 ,y h (t n+1 ))} 

=7n+lS y^/tn</»n+l-i(^+l) Z/n+l-i ~ h n f(t n+1 , y n+ l) > 



i=0 



where the first equality holds due to the extended Riemann-Sticltjcs integral 
(fTT)|) in vector- valued version with coefficients h n ~f n+ i of r 1 ' 1 in (|2"2")1 . The second 
equality uses the properties of the basis functions <fi n . Here the appearance of 
the h n in the coefficients of A h given by (f2"2"| becomes clear. Thus, (|23b|) can 
be written as a system of equations that is nonlinear in {y n }n=i an d linear in 

7 T := [7i T ll ■■■ In] & 



7 T 









KM 


( 


(A ® I) 













+ 












V o J 


V 



hof{h,y\) 
hif(t 2 ,V2) 



\ 



\hN-if(t N ,y N ) ) 



= 0, V 7 (24) 



where A® I denotes the Kronecker tensor product, i.e. the {N ■ d) x (N ■ d) 
matrix with d x d blocks a^- J, and the quadratic matrix A is lower triangular 
with band structure 

/4 0) 
a[ 1} 4 1} 



A = 



\ 



Equation (|24[) holds if and only if the term in the squared brackets van- 
ishes. Since A is lower triangular, each y n +\ is determined directly from 
y s , yi, . . . , y n by the nth equation of the squared brackets term in (j24[) which 
coincides with the nth step of (|7b[) . So, together with the equivalence between 
(fTaj) and (|23cp the lemma is shown. □ 



JJV-l) 



Lemma 8 For the solution y h (t) of (|23b[) - (|23c[) . ifce equation (|23a[) is equiv- 
alent to the discrete adjoint scheme (|10|) of i/ie nominal BDF scheme. 

Proof Analogously to the beginning of the proof of Lemma [3 each integral in 
(I23ap is given by 

I i=0 
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Thus, equation (|23aj) can be formulated cquivalciitly in matrix form with 
:= [wj wl ■ ■ • w T N ] e (R d ) N 



[(>••• J' (y N )] w- (a^Xi-iy w 



'h f y (h,yi) 



A® I 



hN-lfy(tN,VN) , 



w = 0, yw Q ,w (25) 



which is linear in both the variations Wq,w and the unknown A. The equiv- 
alent time-stepping scheme goes backwards in time starting with J'(yjv) — 
aQ^^Xjf + ftjv-iAjf/y^jvil/Jv) = 0. Thus, ([25j) is equivalent to (fTUjl which 
finishes the proof. □ 

The necessary conditions for the well-posedness of (|23b|) - (|23c|) are stated 
in numerous textbooks on BDF methods, for example in [23J Ch.4§3]. With 
the Lipschitz constant L of f(t, y), the sequence of stepsizes and orders has to 
satisfy \h n /aQ L\ < 1 in order to provide a unique solution y h (t) of (1231)1) - 
(|23cp . The solution depends continuously on the input data due to the stability 
of the integration scheme. Since f y (t, y) is bounded by L for all (i, y) and h ni 

k n satisfy \h n /a.Q L\ < 1, the matrix in ((25)) is non-singular and thus (|23a[) 
possesses a unique weak adjoint solution A h {t). The solution depends contin- 
uously on the input data J'{vn) since the stability of the nominal integration 
scheme is carried over to the discrete adjoint scheme [22]. The well-posedness 
of (|23a[) can also be established using the derivation of the equivalent scheme 
(fT0| by automatic differentiation of 0, cf. Section [3] 



6 Convergence analysis of classical adjoints and weak adjoints 

In this section, we focus on the asymptotic behavior of the solutions of the 
discrete adjoint scheme (|10j) . Therefore, we consider a nominal BDF method 
of constant order k with constant stepsizes h using a self-starting procedure 
for 1/1, ... , y m with m > k — 1 fixed. We will call this a non- adaptive BDF 
method. As seen in Section 13.21 the main part of the discrete adjoint scheme, 
i.e. equation (jlObp with n = N — k, . . . , m, is a consistent method of order k 
for a variant of the adjoint equation ((2J). However, the adjoint initialization 
and termination steps do not give consistent approximations. Nevertheless, 
we will prove that the approximations in the main part converge linearly to 
the exact classical solution \(t) of ([2J around the exact nominal solution 
y(t). Using this result, we then show the strong convergence of the finite 
element approximation A h {t) towards the solution A(t) of (|15a[) . i.e. to the 
weak solution of (J2J), in the total variation norm of NBV[£ S , £f] d . 
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6.1 Convergence of the discrete adjoints to the classical adjoint 

The discrete adjoint scheme (|10l) of a non-adaptive BDF scheme reads 

a \ N - J'(y N y = hf^(t N ,y N )X N (26a) 

N-l-n 

^] a l A n+ i +l = hfT(t n+1 ,y n+1 )\ n+1 , n = N - 2, . . . ,N - k (26b) 



i=0 



y^OiXn+i+j = hfZ(t n+ i,y n+ i)A n +i, n = N - k - 1, . . . ,m (26c) 

i=0 

y^q^" +t) A, 1+ i + . t = hf^(t n+1 ,y n+1 )X n+ i, n = m-l,...,0 (26d) 



i=0 



i=0 



where (|26dp accounts for the nominal starting procedure. To investigate the 
scheme (|26p purely as an integration method for the adjoint differential equa- 
tion ([2]), we consider a continuously differentiable approximation y{t) satis- 
fying y(t n ) = y n for n = 0, ...,N, for example a quadratic spline function 

interpolating {y n }n=o an< ^ {/('», yn)}n=o- With the adjoint differential equa- 
tion around y(t) 

~X(t) = fy (t, y(t)) X(t), X(t f ) = J' (y(t f )y (27) 



the main steps (|26cp can be seen as a BDF method of order k applied to 
The adjoint initialization steps (|26al) - (|26b[) can be interpreted as a starting 
procedure for (|26cj) giving inconsistent start values Xn, ■ ■ ■ , Xjy-k+i- 

In the following, we study the asymptotic behavior for decreasing h — > 
and a fixed time point t* which belongs to refining grids, i.e. for every stepsize 
h there exists an n = n(h) such that t* = t n . The interval [t m+ i,tN^k] of 
the main part of (f2"6"| increases and approaches (t s , tf ) for h — > 0. By ||-|| we 
denote any vector norm in K d . 



Lemma 9 Let f y (t,y(t)) be continuously differentiable in t £ [t s , tf ] and 
y(tn) = Vn for n = 0, . . . ,N where {y n }n=o * s computed by the non-adaptive 
BDF method of order k with constant stepsize h. Let X(t) be the exact solution 
of the adjoint differential equation (|27p and let {A„}^ =1 be computed by 
Then, for a fixed timepoint t n = t £ (t s ,t{) there exists H > such that 

X n -X(t n )\\=0(h) 

as the grid is refined with H > h — > 0. 



Proof To ease the notion, we consider a scalar initial value problem, i.e. d = 1. 
Nevertheless, the proof is also valid for systems of initial value problems. Fur- 
thermore, we define some abbreviations B(t) := fy{t, y(t)) and r\ := J'(y(tf)) T . 
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Thus, the starting procedure (|26a[) - (|26b[) can be written equivalcntly using 
A T := [ Xn ■ ■ ■ ^N-k+i ] and the k x 1 unit vector e.\ 



A - hB(t N ,h) 



A = e x j] 



where A = I [An -k+i-.N .n -fc+i:w] T I for the reverse identity matrix J and the 
matrix A from page 1171 and 



B(t N ,h) 



(B{t N ) 
\ B(t N -(k-l)h) 



B(t N )I + 0{h) 



/00 ••• 0\ 
1 



Voo 1/ 



using the Taylor series expansion of the entries B(t^ ~ ih) around ij\r. The 
matrix A is nonsingular since ao =/= 0. Furthermore, for h small enough to 



satisfy 



hA~ 1 B(tjy 7 h) 



< 1 we can use the Neumann series to express the 



inverse of J — hA 1 B(tN, h), see for example Sec. II. 1], which yields 

1 f oc - J 

A= A(l -hA^B^Nih)} em = < ^ (hA- 1 B(t N ,h)Y > A' 1 em 



3=0 



= {l + hA^BitN, h) + C(/i 2 )} Ar x e X T] 
= A _1 ei?7 + hA^Bit^A^e^ + 0(h 2 ). 



(28) 



We want to apply Theorem 4.3 of [TS] to the linear differential equation (|27[) . 
Note that the starting procedure satisfies the assumptions of the theorem due 
to (|28p . As BDF methods are strongly stable, the only essential root of the 



characteristic polynomial p(z) = J2i=o aiZ 1 ^ s * ne principal root Z\ = 1 
Thus, Theorem 4.3 of [T5] gives for certain constants K\ and K2 



K - A(i„) = cxp 



-B{t)At )6 1 +6[ Iu 



tn — h — t[ t 

where \8\ < 1 in the scalar case (||0|| < 1 for d > 1). The quantity Si is 
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^ k-l fc-1 
P ^ ' i=0 i=0 



Z-l 



and the coefficients 7^ sum up to 1, i.e. J^_o 7» = 1. The latter fact together 
with equation ((28)) gives for 7 T := [70 • • • 7fe-i 1 



Si = 7 T A - r) = 7 T A -1 eiJ7 + hA- 1 B(t N )A- 1 e 1 r) + 0(h 2 
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The coefficient ~f T A 1 e\ — 1 of the first addend vanishes which can be verified 
easily for all BDF methods up to order 6. Thus, we obtain 

A„ - A(i„) = fcexp (J ! B{T)dT^JjT A' 1 Bit^A^em 

where both coefficients are bounded which proves the assertion. □ 

The main result of this subsection is the following. 

Theorem 4 Let f{t,y) be continuously differentiable with respect to (t,y). 
Let X(t) be the exact solution of the adjoint differential equation (0) and let 
{X n }^ =1 be computed by ([2"B| . Then, for a fixed timepoint t n = t 6 (t s ,t{) 
there exists H > such that 

\\X„-X(t n )\\=<D(h) (29) 

as the grid is refined with H > h — > 0. 

Proof Let the continuously differentiable spline y(t) be composed of quadratic 
polynomials on I n such that y(t„) = y„, y(t n+1 ) = y n+1 and y(t n+ i) = 
f(t n+ i,y n+ i) for n = 0, . . . , N — 1. Furthermore, we define the interpolation 
operator X that maps a continuously differentiable function g(t) to a contin- 
uously differentiable spline Ig{t) that is composed of quadratic polynomials 
on /„ with Tg(t n ) = g(t n ), Tg(t n+1 ) = g{t n+ i) and Tg(t n+1 ) = g(t n+1 ) for 
n = 0, . . . , N — 1. Then, the difference of y(t) and Ty(t) in C°-norm is 

\\y{t)-Xy{t)\\ CO[tsMd =0(h) 

using Taylor expansions and the convergence of the nominal BDF method. 
Due to the assumption on f(t,y), the exact nominal solution y(t) of (JXJ) is 
twice continuously differentiable such that 

\\y(t)-Iy(t)\\ c0[tstt]d =O(h 2 ) 

due to the approximation property of quadratic splines. Thus, it is 

\\y(t)-y(t)\\ c °[uM« ^ \\y( t )- I y( t )\\co + \\zy(t)-y(t)\\co = o(h). (30) 

Since both adjoint differential equations © and (|2"T)l are linear, their solutions 
\(t) and X(t) can be given explicitly. Substracting the exact adjoint solutions 
and using (pKl)) yields in the C°-norm 

\(t)-X(t) =0{h) (31) 

C°[t a ,tf] d 
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which implies directly the pointwise convergence for every t £ [t s ,t{]. Thus, 
together with Lemma [5] we obtain 



||A„-A(t„)|| < K-\{t n ] 
for t n 6 (i s , if). □ 



A(tn)-A(0 =0(h) 



Remark 1 If f(t,y) is fc-times continuously diffcrentiable in (i, y), the start 
errors of the nominal BDF method of order k are small enough (i.e. the con- 
vergence of order k is guaranteed), and the spline is of corresponding order, 
then (j3Tj) holds with order k in h. 

The discrete adjoints resulting from the adjoint initialization and ter- 
mination steps differ from the exact adjoints in a constant way. For n = 
N, . . . , N — k + 1 the difference is bounded by a positive constant c„ times the 
state A(t f ) = J'(y(i f ) T , i.e. 

||A„-A(i„)|| KcnWJ'tyfaM + Oih). 

This can be shown using (|31|) . the Taylor expansion of A(i„) around if and 
the Neumann series of the inverse of I — hf y (t n+ i,y n+ i). For the discrete 
adjoints from the adjoint termination steps (|26d[) . one also needs Lemma [5] 
and obtains a multiple of A(i s ). 

Without modifications of the adjoint initialization steps (|26a[) - (|26b|) . the 
discrete adjoints on the main part converge linearly to the exact adjoint solu- 
tion A(i) of ©. Nevertheless, we still have to consider the oscillations of the 
discrete adjoints at the interval ends of [i s ,if] which are due to the inconsis- 
tency of the adjoint initialization and termination steps. We will do this in the 
next section. 



6.2 Convergence of the finite element approximation to the weak adjoint 

We will prove the convergence of the finite element approximation of the weak 
adjoint to the exact weak adjoint of © given by (|15a[) with respect to the 
total variation norm of NBV[i s ,if] d (i.e. strong convergence). 

Theorem 5 The finite element approximation A h (t) = ^2^ =1 h n -i\ n H n (t) 
given by the discrete adjoint scheme (|26[) of a non- adaptive BDF method of 
constant order k with constant stepsize h converges to the exact weak adjoint 
solution A(t) = J t A(r)dr where A(r) solves @. The convergence is with 
respect to the total variation norm of NBV[i s , t[] d . 

Proof Let h :~ * f ^* B be the stepsize of the equidistant grid. Thus, the nodes 
are t n = t s + nh for n = 0, . . . ,N. We use the norms mentioned in Section [4. II 
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and consider firstly the ith component, 1 < i < d. To ease the notion, we set 
A := Ai, A h := A\, g '■= Qi such that the dual norm reads 



\ A ~ A WnmuM = „ „ SU P 

\\a\\c°\t s ,t t ] 



' g(t)d (A - A h ) (t) 



As A is given by A(t) = f. A(r)dr and A h is a jump function it holds [HO 
Sec. 36 Example 3] 

j-tt j-t t N 

/ g(t)d(A-A h )(t)= \(t)g(t)dt-J2h\ n g(t n ). 
Jt s Jt s n=1 

Approximating the integral by the composite trapezoidal rule for equidistant 
grids yields 

fl N ^ 1 1 N 

h I -x(t )g(t ) + ]T A(t„)a(*n) + g HtN)g(t N ) > + o(h 2 ) - hX n g(t n ) 

{ " n=l ) ra=l 

= hS^X{t )g(t ) + l X (tn) - A n ] g(t n ) - ^X(t N )g{t N )^ + 0{h 2 ). 

We obtain a bound for the NBV[i s , ifj^-dual norm of A — A h by taking the 
absolute value, using the triangle inequality and the fact that ||<?||c°[t a t( i = 1, 
i.e. 

\\ A - AH \ l N BV[ t ., tf ] ^ h |l A (*0)l + E l A ^™) - A «l + l A ^)l} + °(^)- 

With Theorem [4] the sum over the main part becomes 

N-k N-k 

E |A(*n)-A„|= E 0(h) = 0(l) 

n— m+1 n—m-\-l 

such that the norm is bounded by 
I \A - A h \ I 

IK 1 71 llNBV[t 3 ,t f ] 

|A(t )| + E l A ( f «) - A «l + + E l^N-n) - XN-n\ + |A(ttf)| I + G(/i 2 ). 

n=l n=l J 

Since the magnitude of all remaining addends is bounded according to the 
end of Section |nH] and their number is independent of the step number TV, it 
is | \A — A h \ | NBV r t tf i = 0{h). As this holds for alH = l,...,d and the dual 
norm coincides with the total variation norm (cf. Section [4. ip . the assertion is 
shown. □ 
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By small modifications in the proof of Theorem [5J the assertion can be 
widened to variable stepsizes in the starting procedure. 

The uniform convergence in the total variation norm of NBV[f s , tf] d implies 
the pointwise convergence on the entire time interval which can be shown by 
utilizing the particular partition {t s ,9,t{} for an arbitrary time point 9 £ 
[i s , if]- Thus, Theorem [5] implies the pointwise convergence of A h (t) to A(t) 
on the entire time interval at least with the same convergence rate. 



7 Numerical results 

We illustrate the theoretical results with the help of a nonlinear test case with 
analytic nominal and adjoint solutions. The Catenary |12[ p. 15] is given by a 
second-order ODE 



y(t)=p^l + y(tf, P >0. 
We reformulate the initial value problem as system of first-order equations 

yi(t) = y 2 (t) 



m(t) = pV^+v^W 

and solve it on the interval [0, 2] forp = 3 and y(0) = [l/3cosh(— 3) sinh(— 3)] T . 
As criterion of interest we choose J(y(2)) = yi(2). The analytic nominal so- 
lution is 



»(*) = 



B +- cosh(pi + A) 



sinh (pi + A) 

and the analytic weak adjoint solution in the space NBV[i s ,if] 2 is 

t 



A ^ ~ y -4f ln(cosh(pi + A)) + ^ sinh(pt f + A) arctan (e pt+A ) 
where A and B are determined implicitly by the initial values. 



(32) 



7.1 Non- adaptive BDF method 

We consider a non-adaptive BDF method of constant order 2 on an equidistant 
grid with stepsize h. The self-starting procedure consists of two first-order BDF 
steps with stepsize h/2. The simulations are performed in Matlab. 

The lower row of Figure [3] compares the discrete adjoints for two different 
stepsizes h = 2 -4 and h = 2~ 6 to the analytic solution of the adjoint differen- 
tial equation. The oscillations of the discrete adjoints at the interval ends are 
due to the inconsistency of the adjoint initialization and termination steps of 
the discrete adjoint scheme with the adjoint differential equation (cf. Section 
I3.2j) . Nevertheless, the discrete adjoints converge on the open interval (0,2) 
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Fig. 3 Results of the non-adaptive BDF method for two different stepsizes. Comparison of 
the finite element approximation of the weak adjoint and the analytic weak adjoint (top) 
as well as the discrete adjoints in comparison to analytic Hilbert space adjoint (bottom) for 
different stepsizes. 



towards the analytic adjoint solution as proven by Theorem In the upper 
row of Figure the finite element approximation A h (t) is compared to the 
weak adjoint A(t) given by (|32[) . It converges on the whole time interval as 
shown by Theorem [5] 

Figure 0] shows the Euclidean norm of the difference between the analytic 
weak adjoint (|32[) and the finite element approximation, i.e. 

Error = \ \A(t) - A h (t)\\ 2 , 

evaluated at the final time t = if = 2 and at some interior time point t = 
1.25, respectively, for shrinking stepsizes. The error evaluated at the final time 
decreases at second order rate, a somewhat better behavior than predicted 
by the convergence theory of Section 16.21 This might be due to the second 
order convergence of the discrete adjoints at the initial time together with a 
possible cancellation of discrepancies of the discrete adjoints at the interval 
ends (depicted in the lower row of Figure Overall, this observation calls for 
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Fig. 4 Convergence of the finite element approximation of the weak adjoint to the analytic 
weak adjoint. Error evaluated at the final time tf = 2 and at the interior time point t = 1.25. 

a closer theoretical investigation. The error at the interior time point t = 1.25 
shows the expected linear convergence, cf. Theorem [5] and the subsequent 
comment on the pointwise convergence. 

7.2 Adaptive BDF method 

The software package DAESOL-II [2] provides an efficient realization of a 
variable-order variable-stepsize BDF method based on a sophisticated order 
and stepsize selection. Furthermore, it contains efficient ways to compute the 
discrete adjoints [3JHU2]. We solved the Catenary for two different accuracies 
(relative tolerance 10~ 4 and 10 -9 ) to get a first asymptotic impression of the 
finite element approximation of the adjoint in the case of fully adaptive BDF 
methods. The results are depicted in Figure [5j 

In areas of constant BDF order (fourth row of Figure [5]) and constant step- 
sizes (third row), the discrete adjoints converge to the analytic adjoint solution 
(second row) as seen in the right column on the interval (1, 1.7) approximately. 
On the other areas, i.e. where the order is varying and stepsize is changing, 
the discrete adjoints are highly oscillating (second row). Nevertheless, also in 
these cases, the finite element approximations A h {t) converge to the analytic 
weak adjoint solution ([52"]) on the entire time interval (first row of Figure [5]). 



8 Summary and outlook 

In this contribution, we have addressed the issue of relating the discrete ad- 
joints of variable-order variable-stepsize BDF methods to the solution of the 
adjoint differential equation Since for multistep methods the common 
Hilbcrt space setting is not appropriate to interpret the discrete adjoints, we 
have developed a new Banach space approach. It is based on a constrained 
variational problem in the space of all continuously diffcrcntiablc functions 
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Fig. 5 Results of the adaptive BDF method for different accuracies. Comparison of the 
finite element approximation of the weak adjoint and the analytic weak adjoint (top) as 
well as the discrete adjoints in comparison to analytic Hilbcrt space adjoints (second row). 
Stepsizc ratio (third row) and BDF order (bottom) of the integration scheme. 



with Lagrange multiplier in the space of all normalized functions of bounded 
variation. We have approximated the infinite-dimensional optimality condi- 
tions by a Petrov-Galerkin discretization and have shown the equivalence of 
the resulting equations to the BDF scheme and its discrete adjoint scheme ob- 
tained by adjoint internal numerical differentiation. Thus, discretization and 
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optimization commute in the presented framework and the finite element ap- 
proximation of the weak adjoint is obtained by a simple post-processing of the 
discrete adjoints. Furthermore, we have demonstrated that the discrete ad- 
joint scheme of a non-adaptive BDF method produces discrete adjoints which 
converge linearly to the solution of (J2J on the inner time interval although 
the adjoint initialization steps are inconsistent. We have used this result to 
prove the linear convergence of the finite element approximation on the entire 
time interval to the weak adjoint solution of ([2]) in the space of normalized 
functions of bounded variation. 

The theoretical results have been observed numerically using a non-adaptive 
BDF method to solve the Catenary. Additionally, we have given numerical 
evidence that the finite element approximation serves as proper quantity to 
approximate the weak adjoint also in the case of fully adaptive BDF methods, 
i.e. also in areas of variable order and variable stepsize. 

Thus, we now have a quantity at hand which can be used within global er- 
ror estimation techniques. The functional-analytic framework allows to carry 
over estimation techniques from finite element methods to BDF methods. Fur- 
thermore, the approximations to the weak adjoints can now be computed ef- 
ficiently and accurately by automatic differentiation of the efficient variable- 
order variable-stepsize BDF method without the need of explicit derivation of 
the adjoint equations. 

Acknowledgements The authors express their gratitude to Christian Kirches and Andreas 
Potschka for valuable discussions on the subject. Scientific support of the DFG-Graduatc- 
School 220 "Heidelberg Graduate School of Mathematical and Computational Methods for 
the Sciences" is gratefully acknowledged. Funding graciously provided by the German Min- 
istry of Education and Research (Grant ID: 03MS649A), and the Helmholtz association 
through the SBCancer programme. The research leading to these results has received fund- 
ing from the European Union Seventh Framework Programme FP7/2007-2013 under grant 
agreement n° FP7-ICT-2009-4 248940. 



References 

1. Adams, R., Fournier, J.: Sobolev Spaces, Pure and Applied Mathematics (Amsterdam), 
vol. 140, second edn. Elsevier/ Academic Press, Amsterdam (2003) 

2. Albersmeyer, J.: Adjoint based algorithms and numerical methods for sensitivity gener- 
ation and optimization of large scale dynamic systems. Ph.D. thesis, Ruprecht— Karls- 
Universitat Heidelberg (2010). 

3. Albersmeyer, J., Bock, H. G.: Sensitivity Generation in an Adaptive BDF-Mcthod. 
In: H. G. Bock, E. Kostina, X. Phu, R. Rannacher (eds.) Modeling, Simulation and 
Optimization of Complex Processes: Proceedings of the International Conference on 
High Performance Scientific Computing. March 6-10, 2006, Hanoi, Vietnam, pp. 15—24. 
Springer- Verlag Berlin Heidelberg (2008) 

4. Albersmeyer, J., Bock, H.G.: Efficient sensitivity generation for large scale dynamic 
systems. Tech. rep., SPP 1253 Preprints, University of Erlangen (2009) 

5. Alt, H. W.: Lincare Funktionalanalysis, 4 edn. Springer- Verlag Berlin Heidelberg (2002) 

6. Berkovitz, L.: Optimal Control Theory, Applied Mathematical Sciences, vol. 12. 
Springer- Verlag, New York (1974) 

7. Bock, H. G.: Numerical treatment of inverse problems in chemical reaction kinetics. 
In: K. Ebert, P. Deuflhard, W. Jager (eds.) Modelling of Chemical Reaction Systems, 



29 



Springer Series in Chemical Physics, vol. 18, pp. 102-125. Springer- Vcrlag, Heidelberg 
(1981). 

8. Bock, H. G.: Randwcrtproblemmcthoden zur Parameteridentifizierung in Systemen 
nichtlincarcr Diffcrcntialglcichungen, Bonner Mathematische Schriften, vol. 183. Uni- 
versitat Bonn, Bonn (1987). 

9. Bock, H. G., Plitt, K. J.: A Multiple Shooting algorithm for direct solution of opti- 
mal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242—247. 
Pergamon Press, Budapest (1984). 

10. Bock, H. G., Schlodcr, J. P., Schulz, V.: Numerik groBer Diffcrcntiell-Algebraischer 
Glcichungen - Simulation und Optimicrung. In: H. Schuler (ed.) ProzeBsimulation, pp. 
35-80. VCH Verlagsgesellschaft mbH, Weinheim (1994) 

11. Cao, Y., Li, S., Petzold, L.: Adjoint sensitivity analysis for differential-algebraic equa- 
tions: algorithms and software. Journal of Computational and Applied Mathematics 
149, 171-191 (2002) 

12. Hairer, E., N0rsett, S., Wanner, G.: Solving Ordinary Differential Equations I, Springer 
Series in Computational Mathematics, vol. 8, second cdn. Springer- Verlag, Berlin (1993) 

13. Hartman, P.: Ordinary differential equations, Classics in Applied Mathematics, vol. 38. 
SIAM, Philadelphia, PA (2002). Corrected reprint of the second (1982) edition 
[Birkhauser, Boston, MA; MR0658490 (83e:34002)] 

14. Hartmann, R.: Adjoint consistency analysis of Discontinuous Galerkin discretizations. 
SIAM J. Numer. Anal. 45, 2671-2696 (2007) 

15. Hcnrici, P.: Error Propagation for Difference Methods. Robert E. Krieger Publishing 
Co., Huntington, N. Y. (1970). Reprint of the 1963 edition 

16. Ioffe, A., Tihomirov, V.: Theory of extremal problems, Studies in Mathematics and its 
Applications, vol. 6. North-Holland Publishing Co., Amsterdam (1979). 

17. Johnson, C: Numerical solutions of partial differential equations by the finite clement 
method. Cambridge University Press, Cambridge (1987) 

18. Johnson, C: Error estimates and adaptive time-step control for a class of one-step 
methods for stiff ordinary differential equations. SIAM Journal on Numerical Analysis 
25(4), 908-926 (1988). 

19. Kolmogorov, A., Fomin, S.: Introductory real analysis. Revised English edition. Trans- 
lated from the Russian and edited by Richard A. Silverman. Prentice-Hall Inc., Engle- 
wood Cliffs, N.Y. (1970) 

20. Lucnbcrgcr, D.: Optimization by vector space methods. Wiley Professional Paperback 
Series. John Wiley & Sons, Inc., New York, NY (1969). 

21. Natanson, I.: Theorie der Funktionen einer reellen Veranderlichen. Akadcmie- Vcrlag, 
Berlin (1975). Ubcrsetzung nach der zweiten russischen Auflage von 1957, Heraus- 
gegeben von Karl Bogel, Vierte Auflage, Mathematische Lehrbiicher und Monographicn, 
I. Abteilung: Mathematische Lehrbiicher, Band VI 

22. Sandu, A.: Reverse automatic differentiation of linear multistep methods. In: C. Bischof, 
H. Biicker, P. Hovland, U. Naumann, J. Utke (eds.) Advances in Automatic Differen- 
tiation, Lecture Notes in Computational Science and Engineering, vol. 64, pp. 1-12. 
Springer- Verlag, Berlin (2008) 

23. Shampine, L.: Numerical solution of ordinary differential equations. Chapman & Hall, 
New York (1994) 

24. Walther, A.: Automatic differentiation of explicit Runge-Kutta methods for optimal 
control. Comput. Optim. Applic. 36, 83-108 (2007) 

25. Werner, D.: Funktionalanalysis. Springer- Verlag, Berlin (2000) 

26. Wloka, J.: Funktionalanalysis und Anwendungcn. Walter de Gruyter, Berlin-New York 
(1971). De Gruyter Lehrbuch 



